We survey a new way to get quick estimates of the values of simple statistics (like count, mean, standard deviation, maximum, median, and mode frequency) on a large data set. This approach is a comprehensive attempt (apparently the first) to estimate statistics without any sampling, by reasoning about various sets containing a population interest. Our antisampling techniques have connections to those of sampling (and have duals in many cases), but they have different advantages and disadvantages, making antisampling sometimes preferable to sampling, sometimes not. In particular, they can only be efficient when data is in a computer, and they exploit computer science ideas such as production systems and database theory. Antisampling also requires the overhead of construction of an auxiliary structure, a database abstract . Tests on sample data show similar or better performance than simple random sampling. We also discuss more complex methods of sampling and their disadvantages
aq/aq cc:9116 08/07/98
some content may be lost due to the binding of the book.
CameraCanon EOS 5D Mark II
Contributor.corporateNaval Postgraduate School (U.S.)
Description.sponsorshipPrepared for: Chief of Naval Research